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“It has nothing to do with http://spam.com ;-) 
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The professor was shocked. She never expected any of her students to do this. 
She had her husband, Nine year old daughter and Eleven year old son sitting with 
her and infront of her the computer monitor showed an e-mail from her from her 
favorite student which showed a obscene picture. 


Well this is an imaginary story but if the mail spoofed from my yahoo 
account a week ago was sent to the alumni mailing list this could have 
happened. I was flummoxed and frustrated to see the spoofed mail. I 
mailed the ISP, read a few articles about spam, changed my .plan, posted a 
mail in the inhouse forum. But nothing happened. I wanted to kill all the 
spammers. But I can’t. Instead I decided to kill the spam. So what is this 
spam ? 


1 Introduction 
The jargon [1] file defines spam as 


To mass-mail unrequested identical or nearly-identical email messages, 
particularly those containing advertising. Especially used when the mail 
addresses have been culled from network traffic or databases without the 
consent of the recipients. 


We won't talk about spam any more as everyone knows what it is. So 
lets look at the ways to prevent spam. First we will look in to various 
methods of filtering spam once it reaches the MTA. The sections after that 
deals with more robust technologies which kills spam before it enters the 
mail server. 


2 Filtering Spam 


Spam filtering is done using rule based filtering methods and statistical 
methods. The rule based methods analyze the presence of regular expres- 
sions or combinations of them to filter spam. On the other hand statistical 
spam filtering is more robust and uses probability of occurrence of certain 
tokens in the mail. 


2.1 Rule based Filtering - Server side 


This is the basic method for filtering spam mails and this method work 
based on specific words found in the spam mails. The probability for false 
positives (marking legitimate mail as spam) is very high in this method. 
This method can be used in combination with the methods described in 
the sections given below for better results. Its advisable to use the filtering 
at the server itself as by this method you can avoid wasting bandwidth by 
transporting unwanted spam to your local inbox. 


2.1.1 using .procmailrc + Sendmail or Postfix 


Both Sendmail and Postfix uses procmail as Mail Delivery Agent. Proc- 
mail can be made use to filter spam using rules (recipes) defined in the 
procmailrc. A few examples are given below. These can be expanded to 
suit your environment. 


:0 

* “Received: .*ispam\.net 

* “From: .*spammer@nigerian-scam\.com 
/dev/null 


Its an ugly way to filtering but will work if you often get spam from 
spam.net or from spammer@nigerian-scam.com. All mail from these sources 
will be redirected (saved) to /dev/null. You can redirect the spams to a 
regular file by replacing /dev/null by a file name. 
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* “Content-Type: .*multipart/ 

* 1°71 B ?? “Content-Type: .*application/x-msdownload 

* 1°1 B ?? “Content-Type: .*name=.*\. (exe|scr|pif|com|bat) 
/dev/null 


Send those lil virii to /dev/null. Let them live in the time space void 
forever. 


20 
* *(From|To|Sender|Reply-To):[ ]*.STRING-ADDED-BY-MAILSERVER 


This will catch unqualified addresses as the unqualified addresses are 
used only by spammers. STRING-ADDED-BY-MAILSERVER must be re- 
placed by the string added by the mail server to such mails. 
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* “Subject:.*REMOVE\ ME|\ 
“Subject:.*viagra 
dev/null 


This is the most simple way of filtering. Based on subject. 


2.1.2 .procmailrc + Exim 


When Exim is used as MTA the filtering can be done by asking Exim to 
invoke procmail via .forward file. To enable this add the following to .for- 
ward file. 


| IFS=''&&p=‘which procmail * 
&& test -f Sp && exec $p 
-yf- || exit 75 #username 


Once this is added to the .forward the procmail recipes described above 
will start working. 
2.1.3 Drawbacks of rule bases filtering 
e Percentage of false positives is high (about 5-10%) 
e Filtering is mostly English specific 
e Its very difficult to automate the process and becomes impractical in 
large establishments. 
2.2 Statistical method for filtering 
2.2.1 Bayesian Filtering 


Bayesian spam filters calculates the spam score of a message by assign- 
ing actual probability to the tokens found in the mail. The value assigned 
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to a word may be positive or negative. These values are remembered and 
when the message is completely processed these values are added up. The 
resulting value may be positive or negative. If this value is equal to greater 
than the threshold value for a mail to be identified as spam the mail will 
be marked as spam. Unlike blind rule based filters, Bayesian spam filter- 
ing learns from all mails it sees. As the time passes system adapts itself 
to more efficient and mature system. As the time passed the number false 
positives also decreases. In general Bayesian filtering filters mail to good 
and bad, spam and non-spam not only as spam. So even if a ham comes 
with the words/tokens found in spam it will still be recognized as ham 
based on the overall probability of tokens. The effect of “bad” tokens will 
be nullified by innocent tokens. This can be exploited by hackers by in- 
serting innocent tokens into spams. (Recently I received spams with such 
excellent poetry and I was forced to read them before deleting ;-). Anyway 
we can manually learn such mail as spam and those “beautiful poetry” 
will end up in /dev/null. 


The advantages of statistical spam filtering given below. 


1. They are effective 

2. They generate few false positives 

3. They learn 

4. They let each user define what is spam 


5. They are hard to trick 


2.2.2 Spamassassin : The most popular Bayesian Filter 


Spamassassin is developed by the Apache foundation and the latest ver- 
sion of spamassassin (version 3.0) uses Bayesian filtering to filter spam. A 
brief overview of spamassassin is given below. 


Spamassassin comes in two main flavors: an on-demand scanner and 
a daemon. The former can be invoked every time a message comes in 
and the later continuously runs in memory and scans all the incoming 
messages. This article focuses on the latter approach. The spamassas- 
sin is a complete set of tools which can prevent spam in various methods. 
The spamassassin comprises of three executables spamassassin, spamd ( perl 
scripts) and spamc (C program). The perl scripts run in ’tainted mode” due 


to security reasons. The C program spamc is intended to be called from 
other programs. The basic perl module associated with spamassassin is 
Mail::Spamassassin::Bayes (spam detector and markup Engine) and via plu- 
gins and addon modules its functionality can be extended. Spamassassin 
comes with a Bayes algorithm which “learns” to recognize new spam on 
the basis of old messages (both spam and ham). This makes it possible for 
the software to automatically adapt and identify spam even in the absence 
of specific header or body tests. A (automatic) white list system makes it 
easy to list e-mail addresses that you already know or verified as valid; 
messages from these senders are exempted from further filtering and di- 
rectly get routed to your mailbox. 


2.2.3 How it works ? 


Spamassassin works by performing a range of tests on all the messages it 
sees. A wide number of tests are provided, including checks to see if the 
sender address and IP, recipient address, message dates etc are valid, the 
message body contains any words from a list of forbidden words stored lo- 
cally, if any of the sending servers are blacklisted, and so on. Each test adds 
to a message’s overall spam score and messages whose score exceeds a cer- 
tain user-defined threshold are treated as spam and can be either deleted 
or marked with a special spam header for further processing by other pro- 
grams. 


The Spamassassin can be configured at a mail server with a MTA like 
Postfix, Exim or Sendmail. In such a setup the MTA accepts mail, passes 
it to the MDA and it gives spamassassin control over the mail before com- 
pleting the processing. The spamassassin verifies the mail headers, Body 
of the message etc and makes modification in the header of the message or 
takes appropriate action based on the configuration file. Whenever a mes- 
sage comes in spamassassin tries to scan it by parsing /etc/spamassassin.rules 
or $ENV{HOME}.’/.spamassassin/user prefs [2]. Finally MDA delivers mail 
to the correct location. 


Various actions that can be performed by Spamassassin are : 


e Submit to a distributive spam detecting network 


e Mark as spam 


e Move to a separate spam box 


e write the information collected in the form of tokens to various databases 


Spamassassin is designed in such a way that more plugins can be added 
to it very easily and it can combined with various mail clients like Mutt, 
Outlook express etc. In addition to the above mentioned functionalities 
Spamassassin supports Hashcash and SPF, it can submit the mails marked 
as spam to various distributive spam filtering networks like Vipul’s Razor. 
and perform lookup in various DNSBL’s etc. [3] More about Hashcash and 
SPF are discussed in later sections. 


2.3 Integrating with mail servers 


In mail servers where the same binary handles functionalities of both MTA 
and MDA configuration is done in the main startup script of the SMTP 
daemon. eg: Exim. In the case of SMTP servers which make use of a 
MDA like procmail we can forward the mails to the spamassassin. 


Eg: in the case of Postfix and sendmail add the following to .procmailrc of the 
user. 


:Ofw 
| /usr/bin/spamc # correct path to spamc 


But this is a very basic configuration and impractical in servers with 
heavy traffic. We can use a software called amavisd-new as an interface be- 
tween the MTA and content filters ( antivirus softwares, Spam filters etc). 
A combination of Postfix, amavisd-new and Spamassassin is claimed to 
be the best method to block both spam and viruses. More about the topic 
can be found here: 


tutorials/5561/3/ 


3 Blacklists - RBL 


Practically of no use. RBLs are in existence for years but it never worked 
beyond a certain level. Spammers are too fast and smart to be blocked 
by RBLs. And more over it can be used as a weapon against web hosting 
service providers or companies by adding their IP or domains hosted in 
them to the black lists. The chance of generating false positives and hence 
causing harm is more with such black lists. So we don’t discuss it here. 
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4 Filtering is brain dead 


But I believe that spam “filtering” is a brain dead method to fight spam. 
Its far far better to prevent the chances of spam than fighting against spam 
by filtering. This will save lot of network bandwidth and CPU time to 
process spam. Its is believed that more than 30% of emails send today are 
spam. So in the following sections we will discuss about methods other 
than filtering to prevent spam. 


4.1 HashCash 


Hashcash is a denial-of-service counter measure tool. It’s main current 
use is to help hashcash users avoid losing email due to content based 
and blacklist based anti-spam systems. [4] Hashcash stamps each message 
with X-Hashcash: header and filtering systems and blacklists are encour- 
aged to exempt mails with the valid stamp. 


4.1.1 How it works ? 


The basic idea is that the clients must do some work before they can send 
mail. (proof-of-work) They spend the proof of labour like money to get ser- 
vice. Hashcash creates the stamp similar to a md5 sum but it uses SHA1 
to compute the stamp. The work required to compute the stamp can be 
made arbitrarily expensive (from fractions of a second to hours). The pro- 
cess described above is called minting. At the receivers end the receiver 
can check stamps using the checking function and if the proof-of-work value 
is too low or bogus it simply rejects the mail. The validity of a stamp is by 
default set to 28 days after this period the stamp expires. This is very nec- 
essary since if this is not enabled the available pool of stamps will exhaust 
within a short period. 

Since each mail requires considerable amount of work it becomes very 
hard to send spam mails for the spammers and as a result the total number 
of spams decreases considerably. 


4.2 SPF: Sender Policy Framework 


This technology works by keeping a record of locations (IPs) from where 
a user sends mail. So e-mail spoofing becomes almost impossible. Even 
if the spammers want to send mail they have to use their own identity to 
do so. And by detecting the spam source we can simply block them. SPF 
works by domains publishing reverse MX records to tell the world what 
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machines send mail from the domain. When receiving a message from a 
domain, the recipient can check those records to make sure mail is coming 
from where it should be coming from. With SPF the reverse MX records 
can be published by just one line in the DNS record. [5] 


4.2.1 How to doit? 


Add a single DNS record of type TXT to your DNS record in the format 
given below. 


domainname.com. TXT v=SPF_version_identifier default. mechanism 


This announce which computers are allowed to deliver e-mail from 
your domains. So it is checked by the receiving SMTP server before even 
accepting the content (this significantly reduces bandwidth usage). De- 
tailed information about configuration for various scenarios like shared 


hosting can be found at the implementers site. http: //spf.pobox.com 


The working of the SPF protocol can be described in 3 steps. 


1. A user sends mail from sender.com or a spammer forges from sender.com 
to a user at receiver.com 


2. The SMTP server at receiver.com checks sender.com’s SPF record 


3. If the origin is not listed receiver.com gives the message a fail 


SPF is probably not the end to Spam in total, but it might be the end to 
Spam as we know it today: coming in masses, spoofing e-mail addresses 
and most of all severely annoying. 


4.2.2 Possible flaw in SPF 


With the recent invention(?) of techniques like invisible bullet proof hosting 
in which the certain hosting companies provide untraceable domains by 
providing dynamically changing webspaces can bring down SPF to cer- 
tain level. If the content filtering software installed in the server marks the 
incoming mail as spam and adds the IP of the origin to spam blocking lists 
innocent end users will suffer. And it may bring the entire Internet to its 
knees by corrupting the whole IP name space. (Note that SPF check pass 
doesn’t mean that the mail is not spam) 
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4.2.3 An example: gmail.com 


We can do simple “dig gmail.com txt” from the command prompt prompt 
to see the SPF entries. The received mail contains Received-SPF: field in 
the header. A list of domains supporting SPF can be found here : 


//personal.telefonica.terra.es/web/news/spf/ 


4.3 Domain Keys 


This is a method proposed by Yahoo.com to fight against spam. The method 
is fairly easy to understand and there here is no centralized authority, no 
need to change the existing protocols etc. The mail servers generate a pub- 
lic/private key pair and publish their public key as a part of their DNS 
record. Each outgoing mail is signed with the secret private key. Now the 
signature can be used to verify that the mail is not forged. In this way the 
presence or lack of a valid signature can be used to classify mail as spam 
and ham. 


4.3.1 How it works ? 


A rough overview of the working of the domain keys is given below. 


1. Generate a private key/ public key pair and add the public key in the 
TXT field of the DNS record. 


2. Each outgoing mail is signed with the private key (rsa-sha1) and this 
is added to the e-mail header. 


3. The receiving SMTP sever verifies the sender my checking the signa- 
ture with the public key available in the TXT field of the DNS record 
of the sending domain. 


4. If the signature is verified to be of the server public/private key pair 
the message is accepted. 


5. If the test fails the message is rejected. 


4.3.2 An Example: gmail.com 


If we check the headers of the mails coming from gmail.com we can see 
the following fields. 
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DomainkKey-Signature: a=rsa-shal; c=nofws; 

s=beta; d=gmail.com; 
h=received:message-id:date:from:reply-to:to:subject: 
mime-version:content-type:content-transfer-encoding; 
b=UHWIVAnN9..... jw5mJ7H+A 


It seems Gmail launched Domain keys around 15th September 2004 
and it is still in the beta stages. The various tokens in the header are ex- 
plained below. 


e s=beta shows the sender name. ie ”beta” is the sender name. 
e d=gmail.com is the sending domain’s name 
e a=rsa-sha1 algorithm used to generate the key pair. 


e b=UHWIVA...7H+A is the signature of the message. 


The following DNS query can be used to get the domain key informa- 
tion about a domain. 


dig user. domainkey.domain.com TXT 


The Domain key information about gmail can be found by issuing “dig 
beta. domainkey.gmail.com TXT” from the command prompt. 


5 Conclusion 


These are some of the general methods used for spam prevention. New 
protocols like Sender ID from Microsoft etc are in the development stage 
and some time in the near future we can expect to live in a world with out 
spams and spammers. 


6 Addendum 


1. While writing this article a friend told me about using Content filter- 
ing and Bayesian logic for CRM (Customer Relationship Management). 
A company usually manages communication via various mail addresses 
like info@compny.com, careers@comapny.com etc etc. By using Bayesian 
filtering we can avoid using multiple mail address or sorting mails man- 
ually based on the subject line etc. The system will require initial training 
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and after that it will automatically do the work for us. Spams can be au- 
tomatically deleted. If there are any unclassified mails we can sort them 
manually. This topic is out of the scope of this article so I will try to com- 
pile a separate document on it sometime in the future. 


2. Popmail: is an excellent program which can be used at the client side to 
filter spam. The main advantage of the program is that it will perform fil- 
tering at the server and deletes spams from the server itself. This program 
can be used in combination with programs like fetchmail. The popmail 
project is hosted in Sourceforge.net. 
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